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Abstract —This letter proposes a dictionary learning algorithm 
for blind one hit compressed sensing. In the blind one bit 
compressed sensing framework, the original signal to be recon¬ 
structed from one bit linear random measnrements is sparse 
in an unknown domain. In this context, the multiplication of 
measurement matrix A and sparse domain matrix <&, i.e. D = A<l>, 
should be learned. Hence, we nse dictionary learning to train 
this matrix. Towards that end, an appropriate continuons convex 
cost function is suggested for one bit compressed sensing and a 
simple steepest-descent method is exploited to learn the rows of 
the matrix D. Experimental results show the effectiveness of the 
proposed algorithm against the case of no dictionary learning, 
specially with increasing the number of training signals and the 
number of sign measurements. 

Index Terms —Compressed sensing. One bit measurements. 
Dictionary learning, Steepest-descent. 


I. Introduction 

T he one bit compressed sensing which is the extreme case 
of quantized compressed sensing ni has been extensively 
investigated recently 0-1191. According to compressed sens¬ 
ing (CS) theory, a sparse signal can be reconstructed from a 
number of linear measurements which could be much smaller 
than the signal dimension mol, d. Classical CS neglects the 
quantization process and assumes that the measurements are 
real continuous valued. However, in practice the measurements 
should be quantized to some discrete levels. This is known 
as quantized compressed sensing 0. In the extreme case, 
there are only two discrete levels. This is called one bit 
compressed sensing and it has gained much attention in the 
research community recently 0 - 113 , specially in wireless 
sensor networks M- In the one bit compressed sensing 
framework, it is proved that an accurate and stable recovery 
can be achieved by using only the sign of linear measurements 

Q. 

Many algorithms have been developed for one bit com¬ 
pressed sensing. A renormalized fixed-point iteration (RFPI) 
algorithm which is based on ^^-norm minimization has been 
presented in 0. Also, a matching sign pursuit (MSP) al¬ 
gorithm has been proposed in 0. A binary iterative hard 
thresholding (BIHT) algorithm introduced in 0, which has 
been shown to have better recovery performance than that of 
MSP. Moreover, a restricted-step shrinkage (RSS) algorithm 
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which has been devised in 0 has provable convergence 
guarantees. 

In addition to noise-free settings, there may be noisy sign 
measurements. In this case, we may be encountered with sign 
flips which will worsen the performance. In l6l, an adaptive 
outlier pursuit (AOP) algorithm is developed to detect the sign 
flips and reconstruct the signals with very high accuracy even 
when there are a large number of sign flips 0. Moreover, 
noise-adaptive RFPI (NARFPI) algorithm combines the idea 
of RFPI and AOP 0. In addition, 0 proposes a convex 
approach to solve the problem. Recently, a one bit Bayesian 
compressed sensing IfT^ and a MAP approach ifTSll have been 
developed for solving the problem. 

The basic assumption imposed by CS is that the signal 
is sparse in a domain, i.e. in a dictionary. The dictionary 
is a predefined dictionary or it may be constructed for a 
class of signals. The dictionary learning algorithm attempts 
to And an adaptive dictionary for sparse representation of a 
class of signals m. The most important dictionary learning 
algorithms are the method of optimal directions (MOD) ifTSll 
and K-SVD Ugl. There are some research work that use 
dictionary learning algorithm in the CS framework lEi, m. 
Cl- However, to the best of our knowledge, investigation of 
the dictionary learning algorithm in the one bit CS framework 
has not been reported in the literature. 

In this letter, similar to blind CS ll20l . we assume that 
the sparse domain is unknown in advance. In conventional 
one bit CS, we need to know both the measurement matrix 
A and sparse domain matrix $ to form a multiplication 
matrix D = A$. However, in the sequel we assume that 
the sparse domain matrix $ is unknown. Thus, we learn 
the matrix D by minimizing an appropriate cost function. 
The proposed algorithm similar to the most of the dictionary 
learning algorithms iterates between two steps. The first step 
is the one bit CS and the second step is the dictionary update 
step. Simulation results show the effectiveness of the proposed 
algorithm for reconstructing the sparse vector from one bit 
linear random measurements, specially when the number of 
training signals and sign measurements is large. 

The rest of the letter is organized as follows. Section [I^ 
introduces our proposed algorithm, including problem formu¬ 
lation and the two steps of the algorithm. Simulation results 
are presented in Section m Finally, conclusions are drawn in 
Section |IV] 
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II. The proposed algorithm 
A. Problem Formulation 

Consider a signal x, = in an unknown domain 

$ G where Si G is a sparse vector. In one 

bit compressed sensing, only the sign of the linear random 
measurements are available, i.e, 

y* = sign(Axj + V,), (1) 

where A G is a random measurement matrix, G Z" 

is the sign measurement vector and G K" is the noise 
measurement vector, which is assumed to be i.i.d random 
Gaussian with variance cr^. We aim to estimate the sparse 
vector Si from only sign measurements y^. The problem is to 
find Si and then Xi from the sign measurements 

y^ = sign(DSi + Vi), (2) 

where hereafter, the matrix D = A$ is called dictionary. The 
sparse domain $ is unknown in advance. As a result, the 
dictionary D is also unknown. We use some training signals 
yij 1 < i < T to learn the dictionary matrix D from sign 
measurements, where T is the number of training signals. The 
overall problem of dictionary learning for one bit compressed 
sensing is to find the sparse matrix S = [si| s 2 |...|st] and then 
X = [xi|x 2 |...|xt] from a training matrix Y = [yi|y 2 |---|yT] 
which is 

Y = sign(DS + V), (3) 

where V = [vi|v 2 |...|vt] is the collection of noise vectors. 
After learning the matrix D from the proposed algorithm and 
finding the sparse matrix S, the estimate of matrix $ is $ = 
Aj~£^D where Aj“f£ = (A^A)“^A^ is the left inverse matrix of 
A. Finally, the estimate of the original signals will be X = $S. 


dictionary, we propose the following cost function for the one 
bit CS framework; 

T n 

<^(0) = X! X! “ sign(dfc S,; + n,)), (5) 

i—1 k—1 

where is the fc’th row of the dictionary matrix D and Ind(a;) 
is the indicator function which is defined as: 

Ind(a;) = | ^ ^ / [!’ (6) 

' ^ I 00 a: 7 ^ 0. 

Therefore, the dictionary update step is to solve the following 
optimization problem 


minimize CfD), (7) 

DGK"X-ff 

where (7(0) is given in (|^. The optimization problem in (jv]) 
can be divided into n sub-optimization problems to find the 
rows (dj) of the dictionary matrix D. The sub-optimization 
problems are 


T 

minimize ^ Ind)?/^^—sign(d^Si-t-ni)), l<k<n. (8) 

dfcGM”' . 

i—l 

To solve we use two continuous approximations of the 
two functions Ind(a;) and sign(a;). For sign function, we use 
a continuous S-shaped function as 


sign(a;) « S(a;) 


1 — exp(—a;) 

1 -I- exp(—a;)' 


(9) 


For indicator function, inspired by the definition of i^-norm 
and f^-norm, we define two indicator functions; 


Ind(a:) = /(a;) = | 


|a:| LI indicator function, 
L2 indicator function. 


( 10 ) 


B. Two steps of the proposed algorithm 

Inspired by the most of dictionary learning algorithms 
El, we divide the problem into two steps. The first step 
is the sparse recovery from one bit measurements when the 
dictionary is fixed. The second step is the dictionary update 
when the sparse coefficients are fixed. 

1) One bit compressed sensing: Dictionary D is fixed: 
Various algorithms were proposed to solve the conventional 
one bit compressed sensing (CS) problem, such as BIHT Q, 
MSP 0, RFPI 0, and AGP 0, to name a few. Because of 
its simplicity, in this letter, we use BIHT algorithm to perform 
sparse recovery for all of the training signals. Note that the 
proposed dictionary learning algorithm can use any of the 
sparse recovery methods in the one bit CS framework. For 
notational convenience, we use the following notation for this 
step: 

s, = BIHT(y„D) 1 < * < T. (4) 


Hence, the sub-optimization problem is 


T 

minimize Lfdt) = /(%* — S(dtSi-t-nj), 1 < fc < n. 

dfeGR" ^ 

i—l 

( 11 ) 

Thanks to the approximations, the cost function F{dk) in 
GD is a continuous cost function. It can be shown that with 
neglecting the noise term and considering the i^-norm and 
£^-norm indicator functions, the deterministic cost functions 
•^(D) = Evilly* - S(Ds,)||i and Q(D) = Evilly* “ 
S(DSi)||i are convex with respect to D. The proof is postponed 
to Appendix and Appendix respectively. Hence, both 
have unique minimizer Dopti and Dopt 2 which can be found by 
n parallel simple steepest-descent methods, each responsible 
for finding a row of the dictionary D. The recursion of the 
fc’th steepest-descent is 


dfc 


= Ak- T 


dF{Au) 

ddk 


( 12 ) 


2 ) Dictionary update: Sparse matrix S is fixed: For dictio¬ 
nary update, since we have only the sign of measurements, it 
is infeasible to use the classical dictionary learning algorithms 
such as MOD lITSi or K-SVD lIT^ . In order to learn the 


with the partial derivative 

T 

= ^l'{yik - S(d^s,))^(-S(d^Si)), (13) 

2 = 1 
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where I'{x) is the derivative of function I{x). Substituting 
into < [T^ results in the following final recursion 

T 

dfe =dfc+/r^s,S'(dJs,)/'(yife-S(dfs,)), (14) 

where S'(a;) = derivative of S(a:) and 

the step-size parameter, respectively. In the case of £^-norm 
indicator function with I{x) = the final recursion is 

T 

dfc = d/c H- (d^j. 82 ) 62 ^, (15) 

where Cik = yik — S(d^Si). Regarding £^-norm indicator 
function with I{x) = |x|, the steepest-descent recursion is 


T 

dfc = dfc + ^y^SiS'(dfcSi)sign(ei/c). (16) 

i=l 


Therefore, the overall algorithm is a two-step iterative algo¬ 
rithm which iterates either between Q and ( |T5] l in the case of 
f^-norm indicator function or between (j^ and (16 1 regarding 
f^-norm indicator function. We call these two versions of our 
algorithm DL-BIHT-L2 and DL-BIHT-Ll, respectively. 


o 

O 



— mu=0.1, L2 indicator 
- — mu=0.1,L1 indicator 
<)— mu=10, L2 indicator 

— mu=10, L1 indicator 

— mu=1, L2 indicator 
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Fig. 1. Cost function versus the number of iterations. 


In the second experiment, we utilize the Normalized Mean 
Square Error (NMSE) as a performance metric, which is 
defined as 

NMSE4 201ogio(fc^), (17) 


III. Simulation Results 

This section presents the simulation results. In the simula¬ 
tions, the unknown sparse vector is drawn from a Bernoulli 
Gaussian (BG) model with activity probability p = 0.01 and 
with variance of active samples = 1. To resolve the 
amplitude ambiguity arisen in one bit compressed sensing, 
we normalized the sparse vector to have unit norm. The 
size of the signal vector is assumed to be m = 50. The 
elements of sensing matrix A are obtained from a standard 
Gaussian distribution with ~ A/'(0,1). The elements of 
sparse domain matrix $ are assumed to be drawn from a 
standard Gaussian distribution. The columns of this matrix 
are also normalized to have unit norm. The number of atoms 
are assumed to be Tf = 100. The additive noise \i is 
considered as Gaussian random variable with distribution 
Vki -^(0, CT^) where cr„ = 0.01. For initialization of the 
dictionary D, we use a perturbed version of D which is 
Dinit = D -f 0.1 X randn(n,Ar). For the BIHT algorithm, 
we used 20 iterations with the parameter t — 1 0. 

In the first experiment, we examine the convergence be¬ 
havior of the proposed cost function for different values of 
step-size parameter p. The number of iterations is selected 
as 40 which is sufficient for the convergence of the cost 
function in most of the simulation cases. The number of 
training signals is T = 100. The number of sign measurements 
is assumed to be n = 100. Figure [T] shows the cost function 
llyi ~ S(DSi )||2 versus the number of iterations 
for both DL-BIHT-L2 and DL-BIHT-Ll and for three values 
of /i = 0.1, /i = 1 and /i = 10. It is seen that both DL- 
BIHT-L2 and DL-BIHT-Ll exhibit a monotone decreasing 
cost functions that achieve the lowest values after almost 40 
iterations. Among the three values for step size p, the best 
value is p = 1 which leads to the fastest convergence. We use 
this value in the next experiments. 


where X is the estimate of the true signal X. All the NMSEs 
are averaged over 50 Monte Carlo (MC) simulations. The 
number of training signals vary between T = 100 and 
T = 1000. The number of sign measurements is again 
n = 100. Figure shows the NMSE performance versus 
the number of training signals for DL-BIHT-L2, DL-BIHT-Ll 
and without dictionary learning (DL) algorithm. It is seen that 
when T = 1000, dictionary learning algorithms outperform the 
case of without dictionary learning by 4 dB performance gain. 
It is also observed that the proposed DL-BIHT-L2 performs 
slightly better than DL-BIHT-Ll and the NMSE decreases as 
the number of training signals increases. 

In the third experiment, we explore the role of the number 
of measurements. In this case, the number of training signals 
is selected as T = 500. The other parameters are the same 
as the second experiment. The number of sign measurements 
n varies from 100 to 500. Figure shows the NMSE perfor¬ 
mance versus the number of sign measurements. The hgure 
shows that with increasing the number of measurements, 
the performance of recovering the original signal X by the 
proposed algorithms improves. Also, both DL-BIHT-L2 and 
DL-BIHT-Ll significantly outperform the case of without 
dictionary learning algorithm. Particularly, when the number 
of measurements is 500, both algorithms achieve about 10 dB 
performance gain. 

IV. Conclusion 

We have proposed a new iterative dictionary learning al¬ 
gorithm for the noisy sparse signal reconstruction in one bit 
compressed sensing framework when the sparse domain is un¬ 
known in advance. The algorithm has two steps. The first step 
is the sparse signal recovery from one bit measurements which 
is performed by BIHT algorithm in this paper. The second 
step is to update the dictionary matrix. This is carried out by 














IEEE SIGNAL PROCESSING LETTERS, VOL.XX, NO.X 


4 



Fig. 2. NMSE versus number of training signals. 



Fig. 3. NMSE versus number of sign measurements. 


minimizing a suitable cost function in the one bit compressed 
sensing framework. A simple steepest-descent method is used 
to update the rows of the dictionary matrix. Simulation results 
show the effectiveness of the dictionary learning in monotone 
converging of the cost function and estimating the original 
signals specially when the number of training signals and the 
number of sign measurements increases. 


Appendix A 

Proof of the Convexity of .f^NoRM Cost Function 

To verify the convexity of J(D) = llyi ~ S(DSi)|||, 

where S(a:) is dehned in (j^, we prove that the second 
is positive. First, we consider the first order 
. Some simple calculations show that 


derivative 


a^j(D) 


vector derivative 


(18) 

Following some other manipulations, we reach to 
5J(D) 


T 


ddo 


= > 2S'(dJsi)[-j/u + S(dj s,)]s*. (19) 


i=l 


Therefore, the scalar partial derivative is equal to 


J2l-2S'(dfs,)s,kl/^, + 2S'(dJs,)s,feS(dJs,)]. (20) 
The second order derivative is 

To o 

Y^[-2siky^J^{S'{dJsi)) + 2sik-^iS'{dJsi)S{dJsi))] 

i=i 

( 21 ) 

The two partial derivatives in are equal to 

af^(S'(dJsi)) = S"{djs,)sik and g|^(S'(dJsi)S(dJsO) = 
SifeS"(dJsi)S(dJsi) + Sik{S'{dJs,)y. Replacing these two 
terms in ( |2T] i results in 

jk 


Consider S(x) = S'(x) = and 

S"(x) = ^ jf cajj shown that for the 

two cases yij = 1 and ytj = —1, the expression in the 
summation in ([2^ is positive. For example, consider the case 


2exp(—ai) 


( 22 ) 


yij = 1 . Dehning x = dj s^, with some calculations, we have 


S"(dj s,)(- 2 /z,+S(dj s.))+(S'(dj s.))" = 


4exp(—3x) 


(1 + exp(-x))"‘ 

(23) 

Now, consider the case yij = —1. In this case, it can be shown 
that 


> 0 


4exp(—x) 


S"(dj s0(-2/z,+S(d^-s,))+(S'(d^-s,)) = 


> 0 


. (24) 

Therefore, by proving > 0, the proof of the convexity 

of J(D) is complete. 

Appendix B 

Proof of the Convexity of .f^-NoRM Cost Function 

To prove the convexity of Q(D) = llyi ~ S(DSi)||i, 

where S(x) is given in (j^, we prove that each of the sub- 
optimization problems ^(dfc) = Y.'i=i WVik - S(d^Si)||i for 
1 < fc < n is convex. Let f{dk) = yik — S(dfsi), then 
= -S'(dfcS,)s, and = -S"(dfcS,)sfs,; are the 

hrst and second order vector derivative of f{dk) with respect 
to dk, respectively. Hence, if S"(dfcSi) > 0 then /(dfc) is 
concave. Conversely, if S"( dls,) < 0 then /(dfc) is convex. 
Let z = d^Si, then if S"(z) > 0, we have z < 0 and as a 
result yik = -1. Hence, /(dfc) = yik - S(z) = -1 - S(z) is 
negative. Using the composition property ([ED, p. 84), since 
ll'll]^ is convex and non-increasing when /(dfc) < 0, also /(dfc) 
is concave, we conclude that ||/(dfc)||]^ is convex. As sum of 
convex functions is convex, thus q{dk) is convex and hnally 
(5(D) is convex. If S"(z) < 0, we have z > 0 and as a result 
y^k = 1- Hence, /(dfc) = y^k - S(z) = 1 - S(z) is positive. 
Using the composition property (ED, P- 84), since H-Hj^ is 
convex and non-decreasing when /(dfc) > 0, also /(dfc) is 
convex, we conclude that ||/(dfc)|jj^ is convex. Again, because 
sum of convex functions is convex q{dk) is convex, which 
results in the convexity of the cost function (5(D). 
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